The Effect of Maternal Tobacco Smoke Exposure on the Placental Transcriptome

Study Description

  • Gene expression profiling on placentas from women exposed to smoke during pregnancy and women not exposed to smoke

  • Raw data

    • Sample_ID

    • Characteristics: e.g. age, bmi, week of delivery, and smoking status

    • Gene probes

  • Annotation file

Data Cleaning and Wrangling

ncbi_data <- read_tsv(here("data/01_ncbi_data.tsv.gz"))  # Read and display the gzipped TSV file
print(head(ncbi_data, 10))  # Print the first 10 rows to display
# A tibble: 10 × 77
   col_1    col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10 col_11 col_12
   <chr>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr>  <chr> 
 1 !Series… Effe… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 2 !Series… GSE1… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 3 !Series… Publ… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 4 !Series… Sep … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 5 !Series… Jan … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 6 !Series… 2009… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 7 !Series… Smok… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 8 !Series… The … <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
 9 !Series… Expr… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
10 !Series… Hana… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>   <NA>  
# ℹ 65 more variables: col_13 <chr>, col_14 <chr>, col_15 <chr>, col_16 <chr>,
#   col_17 <chr>, col_18 <chr>, col_19 <chr>, col_20 <chr>, col_21 <chr>,
#   col_22 <chr>, col_23 <chr>, col_24 <chr>, col_25 <chr>, col_26 <chr>,
#   col_27 <chr>, col_28 <chr>, col_29 <chr>, col_30 <chr>, col_31 <chr>,
#   col_32 <chr>, col_33 <chr>, col_34 <chr>, col_35 <chr>, col_36 <chr>,
#   col_37 <chr>, col_38 <chr>, col_39 <chr>, col_40 <chr>, col_41 <chr>,
#   col_42 <chr>, col_43 <chr>, col_44 <chr>, col_45 <chr>, col_46 <chr>, …
characteristics_data <- read_tsv(here("data/02_dat_clean_characteristics.tsv"))  # Read characteristics data
print(head(characteristics_data, 10))  # Print the first 10 rows to display
# A tibble: 10 × 8
   sample_id age_years status     maternal_bmi parity gestational_age_weeks
   <chr>         <dbl> <chr>             <dbl>  <dbl>                 <dbl>
 1 GSM451162        32 non-smoker         25.7      1                    39
 2 GSM451163        28 non-smoker         18.8      1                    37
 3 GSM451164        38 non-smoker         21.3      2                    36
 4 GSM451165        34 non-smoker         23.5      3                    41
 5 GSM451166        27 smoker             22        2                    42
 6 GSM451167        31 non-smoker         31.8      2                    39
 7 GSM451168        31 non-smoker         25.6      1                    39
 8 GSM451169        32 non-smoker         19.8      1                    39
 9 GSM451170        32 non-smoker         18.6      2                    41
10 GSM451171        33 non-smoker         23.7      3                    41
# ℹ 2 more variables: mode_of_delivery <chr>, placental_volume_cm3 <dbl>
genes_data <- read_tsv(here("data/02_dat_clean_genes.tsv"))  # Read gene data
print(head(genes_data, 10))  # Print the first 10 rows to display
# A tibble: 10 × 11,156
   sample_id ilmn_1698220_phtf2 ilmn_1784717_rps19 ilmn_1761911_scyl1bp1
   <chr>                  <dbl>              <dbl>                 <dbl>
 1 GSM451162               960.               292.                  164.
 2 GSM451163               915.               265.                  184.
 3 GSM451164               983.               217.                  181.
 4 GSM451165              1005.               358.                  197.
 5 GSM451166               866.               241.                  154.
 6 GSM451167               863.               216.                  182.
 7 GSM451168               815.               239.                  219.
 8 GSM451169              1006.               244.                  196.
 9 GSM451170               802.               323.                  172.
10 GSM451171               856.               260.                  227.
# ℹ 11,152 more variables: ilmn_1706784_h2afv <dbl>,
#   ilmn_1815346_tmem136 <dbl>, ilmn_1764927_cdc42ep1 <dbl>,
#   ilmn_1776347_tcp1 <dbl>, ilmn_1660661_tcp1 <dbl>,
#   ilmn_1692844_tbc1d19 <dbl>, ilmn_1769158_isoc2 <dbl>,
#   ilmn_1685237_flj20718 <dbl>, ilmn_1714905_loc644563 <dbl>,
#   ilmn_1797082_snx13 <dbl>, ilmn_1684031_myo1a <dbl>,
#   ilmn_1693210_nsmce2 <dbl>, ilmn_1651800_gstm4 <dbl>, …
clean_data <- read_tsv(here("data/02_dat_clean.tsv"))  # Read clean merged data
print(head(clean_data, 10))  # Print the first 10 rows to display
# A tibble: 10 × 11,163
   sample_id age_years status     maternal_bmi parity gestational_age_weeks
   <chr>         <dbl> <chr>             <dbl>  <dbl>                 <dbl>
 1 GSM451162        32 non-smoker         25.7      1                    39
 2 GSM451163        28 non-smoker         18.8      1                    37
 3 GSM451164        38 non-smoker         21.3      2                    36
 4 GSM451165        34 non-smoker         23.5      3                    41
 5 GSM451166        27 smoker             22        2                    42
 6 GSM451167        31 non-smoker         31.8      2                    39
 7 GSM451168        31 non-smoker         25.6      1                    39
 8 GSM451169        32 non-smoker         19.8      1                    39
 9 GSM451170        32 non-smoker         18.6      2                    41
10 GSM451171        33 non-smoker         23.7      3                    41
# ℹ 11,157 more variables: mode_of_delivery <chr>, placental_volume_cm3 <dbl>,
#   ilmn_1698220_phtf2 <dbl>, ilmn_1784717_rps19 <dbl>,
#   ilmn_1761911_scyl1bp1 <dbl>, ilmn_1706784_h2afv <dbl>,
#   ilmn_1815346_tmem136 <dbl>, ilmn_1764927_cdc42ep1 <dbl>,
#   ilmn_1776347_tcp1 <dbl>, ilmn_1660661_tcp1 <dbl>,
#   ilmn_1692844_tbc1d19 <dbl>, ilmn_1769158_isoc2 <dbl>,
#   ilmn_1685237_flj20718 <dbl>, ilmn_1714905_loc644563 <dbl>, …

Decription of Data

summary_data <- read_csv(here("results/description_table.csv"))  # Read sammary table
print(summary_data)  # Print table
# A tibble: 5 × 3
  ...1                                    `non-smoker`   smoker        
  <chr>                                   <chr>          <chr>         
1 Mothers in Cohort                       62             12            
2 Age in years median (range)             33 (23-46)     26 (22-36)    
3 Maternal BMI mean (range)               23 (15.9-34.9) 25 (18.6-34.1)
4 Parity mean                             2              2.2           
5 Gestational Age in weeks median (range) 39 (35-42)     40 (36-42)    
  • Important characteristics for pregnancy
  • Difference in age

Description of Data

  • Little to no difference in placental volume
  • Surprising as children of smokers have a lower birth weight [1]
  • Possible lifestyle or physiological factors linked to smoking

[1] Kataoka, M. C., Carvalheira, A. P. P., Ferrari, A. P., Malta, M. B., de Barros Leite Carvalhaes, M. A., & de Lima Parada, C. M. G. (2018). Smoking during pregnancy and harm reduction in birth weight: a cross-sectional study. BMC pregnancy and childbirth, 18, 1-10.

Analysis of Data: Log2 Fold Expression

  • Bulletpoints

Analysis of Data: Linear Regression Analysis

  • Linear regression fitted to each gene based on smoking status and gene expression level
  • 0 represents non-smokers, 1 represents smokers
  • b = intercept, a = estimate
  • Positive estimate indicates upregulation of gene in smokers

Analysis of Data: Forest Plot

Significant Values:

  • No significant q-values (q<0.05)
  • Significant p-values (p<0.05) found

Forest Plot:

  • Significant estimates and corresponding error bars

  • 3 clusters may suggest genes within the same share regulatory mechanisms

  • One cluster of genes is down-regulated, while the two other are upregulated

Analysis of Data: Volcano Plot

  • Volcano plot with few data points
  • x-axis: estimates from linear regression analysis
  • y-axis: statistical significance of the effect sizes
    • Higher values, lower p-values
  • Few genes exhibit significant downregulation, while others show significant upregulation

SLET

  • Bulletpoints

Future Perspectives & Conclusion

Future Perspectives

Conclusion

  • Several gene probes represent very similar gene expression levels, resulting in several genes obtaining similar log2-fold-change and estimates in the linear regression
  • From our analysis, it is implicated that nearly all genes exhibit differential expression between smokers and non-smokers. Some genes exhibit a significant up- or downregulation, while most findings are not significant